[FLINK-36794] [cdc-composer/cli] pipeline cdc connector support multiple data sources by linjianchang · Pull Request #3844 · apache/flink-cdc

linjianchang · 2025-01-08T11:06:07Z

pipeline cdc connector support multiple data sources
The relevant design and verification documents are as follows:
https://www.yuque.com/oniucium/lfk1te/ocp6m13kpgh1x9pg?singleDoc#

ChaomingZhangCN · 2025-01-09T02:43:00Z

+                stream = stream.union(streamBranch);
+            }
+        }
+        boolean isParallelMetadataSource = dataSource.isParallelMetadataSource();


I think multi data sources should be regarded as parallelized.

Already modified

ChaomingZhangCN · 2025-01-09T02:50:41Z

+
+```yaml
+source:
+   type: mysql_mutiple


Should we use a new key like 'sources' to describe multiple sources? The '_multiple' suffix in value seems a bit odd. Because the YAML content does not correspond one-to-one with the PipelineDef.

Already modified

sources: - type: mysql name: mysql-instance-00 hostname: localhost port: 3306 .... - type: mysql name: mysql-instance-01 hostname: localhost port: 3307 ....

And the corresponding PipelineDef looks like this:

public class PipelineDef { @Nullable private List<SourceDef> sources; private final SourceDef source; ... }

If the sources is not null then we use these data sources, otherwise we use source to build up DataStream. In this way, the previous usage will not be affected.
I want to hear your opinion. @yuxiqian

I do like @ChaomingZhangCN's proposed syntax for a fully multiple data source, they're intuitive and expressive, but might be a chore if users just want to connect to a MySQL cluster with multiple servers, as they have to copy all identical configurations to both source definition.

@linjianchang's solution for now seems like MySQL specific, especially for multi-host clusters. It could not be extended for hetero-sources (like concatenating data from different DBMS), or when one wants to use different configs for each node. These cases don't exist for now since all we have is MySQL source connector, but as we're modifying composer and YAML API (instead of MySQL connector itself), such possibility should be discussed more carefully.

As for multiple sources in pipeline itself, I remembered the idea has been informally discussed with @leonardBang and @PatrickRen long time ago, and the conclusion was running multiple sources in one single job actually makes the pipeline more fragile, since any single-point failure would easily escalate and cause a global failover. Things might have changed since then, still needs hearing from senior developers on this.

sources: - type: mysql name: mysql-instance-00 hostname: localhost port: 3306 .... - type: mysql name: mysql-instance-01 hostname: localhost port: 3307 ....

And the corresponding PipelineDef looks like this:

public class PipelineDef { @Nullable private List<SourceDef> sources; private final SourceDef source; ... }

If the sources is not null then we use these data sources, otherwise we use source to build up DataStream. In this way, the previous usage will not be affected. I want to hear your opinion. @yuxiqian @ChaomingZhangCN

It has been modified according to comments, please review it again, thanks!

ChaomingZhangCN · 2025-01-09T02:51:11Z

+    private static final String HOST_NAME = "hostname";
+    private static final String PORT = "port";
+    private static final String COLON = ":";
+    private static final String MUTIPLE = "_mutiple";


Should be _multiple.

Already modified

@ChaomingZhangCN Hi,please help review when you have time,thanks!

lvyanquan · 2025-12-05T09:28:07Z

+
+   - type: mysql
+       name: MySQL multiple Source2
+     hostname: 127.0.0.2


It is essential to verify state compatibility.
Would adding or removing a Source from an existing job lead to state compatibility issues, or could reordering the Sources result in state inconsistency? Such incompatible operations must be exposed as errors to avoid silent handling.

It is essential to verify state compatibility. Would adding or removing a Source from an existing job lead to state compatibility issues, or could reordering the Sources result in state inconsistency? Such incompatible operations must be exposed as errors to avoid silent handling.
@lvyanquan Thanks for the review！I have already made the modifications and tested them. Please refer to the following document
https://www.yuque.com/oniucium/lfk1te/yruqcmf9vbz4rxzv?singleDoc# 《multiple source从状态恢复验证》

…ple data sources

github-actions · 2026-04-18T00:12:57Z

This pull request has been automatically marked as stale because it has not had recent activity for 120 days. It will be closed in 60 days if no further activity occurs.

github-actions Bot added docs Improvements or additions to documentation composer cli mysql-pipeline-connector labels Jan 8, 2025

ChaomingZhangCN reviewed Jan 9, 2025

View reviewed changes

linjianchang force-pushed the master-36794 branch 4 times, most recently from f8524d7 to 994d17a Compare January 17, 2025 02:49

linjianchang force-pushed the master-36794 branch from 994d17a to c501ce8 Compare April 25, 2025 01:32

github-actions Bot added values-pipeline-connector oracle-cdc-connector debezium and removed mysql-pipeline-connector labels Apr 25, 2025

linjianchang force-pushed the master-36794 branch from c501ce8 to 8aad734 Compare April 25, 2025 06:56

github-actions Bot added the mysql-pipeline-connector label Apr 25, 2025

linjianchang closed this May 23, 2025

linjianchang force-pushed the master-36794 branch from 8aad734 to cc561c0 Compare May 23, 2025 01:44

linjianchang reopened this May 23, 2025

github-actions Bot removed oracle-cdc-connector debezium labels May 23, 2025

linjianchang requested review from ChaomingZhangCN and yuxiqian October 10, 2025 08:27

linjianchang force-pushed the master-36794 branch from f6862fe to fbc1b2b Compare December 5, 2025 03:38

linjianchang closed this Dec 5, 2025

linjianchang force-pushed the master-36794 branch from fbc1b2b to 8edc345 Compare December 5, 2025 06:13

linjianchang reopened this Dec 5, 2025

lvyanquan reviewed Dec 5, 2025

View reviewed changes

linjianchang force-pushed the master-36794 branch from 59898ec to 2225400 Compare December 18, 2025 06:08

github-actions Bot added common runtime labels Dec 18, 2025

[FLINK-36794] [cdc-composer/cli] pipeline cdc connector support multi…

dd6e0b1

…ple data sources

linjianchang force-pushed the master-36794 branch from 57eb8fa to dd6e0b1 Compare December 18, 2025 06:15

github-actions Bot added the Stale label Apr 18, 2026

Conversation

linjianchang commented Jan 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lvyanquan Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

linjianchang commented Jan 8, 2025 •

edited

Loading

lvyanquan Dec 5, 2025 •

edited

Loading